Search Results: "erich"

15 June 2012

Erich Schubert: Dritte Startbahn - 2 gewinnt!

Usually I don't post much about politics. And this even is a highly controversial issue. Please do not feel offended.
This weekend, there is an odd election in Munich. Outside of Munich, nearby the cities of Freising and Erding, there is the Munich airport. The company operating the airport is owned partially by the city of Munich, which gives the city a veto option.
The Munich airport has grown a lot. Everybody who has been flying a bit knows that big airports (such as Heathrow) are oft the worst. If anything goes wrong, you are busted, because it will take them a long time to resume operations. This just happened to me in Munich, where the luggage system was down, and no luggage arrived at the airplanes.
Yet, they want to take the airport further down this road, and make it even bigger: add two satellite terminals, and a third runway. I'm convinved that this will make the airport much worse for anybody around here. The security checkpoints will be even more crowded, the lines for the baggage drop-off too, and you will have to walk much further on the airport.
Up to now, the Munich airport was pretty good compared to others. In particular given that is is one of the largest in Europe! It is because it had been designed from the ground up for this size. Now they plan to screw it up.
But there are other arguments against this, not the egoist view of a traveller. The first is the money issue. The airport is continuously making losses. It's the taxpayer that has to pay for all of this - and the current cost estimation is 1200 million. This is not appropriate, in particular since history shows that you can take this x2 to x10 to get the real number. They should first get the airport into a financial stable condition, then plan on making it even bigger.
Then there are the numbers. Just like with any polticial large-scale project, the numbers are all fake. The current airport was planned to cost 800 million, in the end it was about 8550 million. The politicians happily lie to us. Because they want to push their pet projects. We must no longer accept such projects based on fake numbers and old predictions.
If you are already one of the 10 largest airports in Europe, can you really expect to grow even further?!? There is a natural limit to growth, unless you want to have every single passenger on the world first travel to Munich multiple times, then go to his final destination ...
One thing they seem to have completely neglected is that Berlin is currently getting a brand new airport. And very likely, this is going to divert quite some traffic away from Munich. Just like the Munich airport diverted a lot of traffic away from Frankfurt. To some extend because many people actually want to go to Berlin, not Munich, but they currently have to change planes here or in Frankfurt. So when Berlin finally is operational, this will have an impact on Munich.
And speaking of the Berlin airport, this is a good example to not trust the numbers and our politicians. It is another example of a way-over-budget, way-behind-time project the politicians screwed up badly and where they lied to us. If we should not have trusted them with Berlin, why should we trust them with the Munich addon?
A lot of people whose families have been living there for years will have to be resettled. Whole towns are going to disappear. An airport is huge. Yet, they cannot vote against it, because their small towns do not own shares of the airport. The polticians don't even talk to them, not even to their poltical representatives.
Last but not least, the airport is in a sensitive ecological area. The directly affected area is an European special protection area for wild birds. There are nature preserves nearby, and all this area already suffers badly from airport drainage, pollution and noise. When they built the airport originally, the replacement areas they setup were badly done, and are mostly inhabited by nettles and goldenrod (which is not even native to Europe). See this article in S ddeutsche Zeitung on the impact on nature. You can't replace the loss of the original habitats just by digging some pools and filling them with water ...
If you want more information, go to this page, by Bund Naturschutz.
This is not about progress ("Fortschritt"). That is a killer argument the politicians love, but it doesn't hold. Nobody is trying to shut down the airport. Munich will be better of by keeping the balance having both a reasonably sized airport (and in fact, the airport is already one of the 10 largest in Europe!) and preserving some nature to make it worth living here.
If you are located in Munich, please go vote against the airport extension, and enjoy the DEN-GER soccer game afterwards. Thank you.

9 June 2012

Erich Schubert: DMOZ dieing

Sometime in the late 1990s I became a DMOZ editor for my local area. At that time, when the internet was a nieche thing and I was still a kid, I was actually operating a web site that had a similar goal as the corresponding category for a non-profit organization.
In the following years, I would occasionally log in, try to review some pages. It was a really scary experience: it was still exactly the same, web 0.5 experience. You had a spreadsheet type of view, tons of buttons, and it would take like 10 page loads to just review a single site. A lot of the time, you would end up search a more appropriate category, copy the URL, replace some URL-encoded special characters, paste it in one out of 30 fields on the form just to move the suggested site to a more appropriate category. Most of the edits would be by bots that detected a dead link and disabled it by moving it to the review stage. While at the same time, every SEO manual said you need to be listed on DMOZ, so people would mass-submit all kinds of stuff to DMOZ in any category that it could in any way fit in.
Then AOL announced DMOZ 2.0. And everybody probably thought: about time to refresh the UI and make everything more usable. But it didn't. First of all, it came late (announced in 2008, actually delivered sometime in 2010), then it was incredibly buggy in the beginning. They re-launched 2.0 at least two times. For quite some time, editors would be unable to login.
When DMOZ 2.0 came, my account was already "inactive", but I was able to get it re-activated. And it still looked the same. I read they changed from Times to Arial, and probably changed some CSS. But other than that, it was still as complicated to edit links as you could make it. So I did just a few changes then lost interest largely again.
During the last year I must have tried to give it another try multiple times. But my account had expired again, and I never got a reply to my reinstatement request.
A year ago finall Google Directory - the most prominent use of DMOZ/ODP data, although the users were totally unaware of it - was discontinued, too.
So by now, DMOZ seems to be as dead as it can get (they don't even bother to answer former contributors that want to get reinstated). The links are old, and if it weren't for bots to disable dead sites, it would probably look like an internet graveyard. But this poses an interesting question: will someone come up with a working "web 2.0 social" idea of the "directory" concept (I'm not talking about Digg and these classic "social bookmarking" dead ducks)? Something that strikes the right balance of on one hand the web page admins (and the SEO gold diggers) being allowed to promote their sites (and keep the data accurate) and at the same time crowd-sourcing the quality control, while also opening the data? To some extend, Facebook and Google+ can do this, but they're largely walled gardens. But they don't have real social quality assurance; money is key there.

31 May 2012

Russell Coker: Links May 2012

Vijay Kumar gave an interesting TED talk about autonomous UAVs [1]. His research is based on helicopters with 4 sets of blades and his group has developed software to allow them to develop maps, fly in formation, and more. Hadiyah wrote an interesting post about networking at TED 2012 [2]. It seems that giving every delegate the opportunity to have their bio posted is a good conference feature that others could copy. Bruce Schneier wrote a good summary of the harm that post-911 airport security has caused [3]. Chris Neugebauer wrote an insightful post about the drinking culture in conferences, how it excludes people and distracts everyone from the educational purpose of the conference [4]. Matthew Wright wrote an informative article for Beyond Zero Emissions comparing current options for renewable power with the unproven plans for new nuclear and fossil fuel power plants [5]. The Free Universal Construction Kit is a set of design files to allow 3D printing of connectors between different types of construction kits (Lego, Fischer Technic, etc) [6]. Jay Bradner gave an interesting TED talk about the use of Open Source principles in cancer research [7]. He described his research into drugs which block cancer by converting certain types of cancer cell into normal cells and how he shared that research to allow the drugs to be developed for clinical use as fast as possible. Christopher Priest wrote an epic blog post roasting everyone currently associated with the Arthur C. Clarke awards, he took particular care to flame Charles Stross who celebrated The Prestige of such a great flaming by releasing a t-shirt [8]. For a while I ve been hoping that an author like Charles Stross would manage to make more money from t-shirt sales than from book sales, Charles is already publishing some of his work for free on the Internet and it would be good if he could publish it all for free. Erich Schubert wrote an interesting post about the utility and evolution of Favebook likes [9]. Richard Hartmann wrote an interesting summary of the problems with Google products that annoy him the most [10]. Sam Varghese wrote an insightful article about the political situation in China [11]. The part about the downside of allowing poorly educated people to vote seems to apply to the US as well. Sociological Images has an article about the increased rate of Autism diagnosis as social contagion [12]. People who get their children diagnosed encourage others with similar children to do the same. Vivek wrote a great little post about setting up WPA on Debian [13]. It was much easier than expected once I followed that post. Of course I probably could have read the documentation for ifupdown, but who reads docs when Google is available? Related posts:
  1. Links March 2012 Washington s Blog has an informative summary of recent articles about...
  2. Links April 2012 Karen Tse gave an interesting TED talk about how to...
  3. Links February 2012 Sociological Images has an interesting article about the attempts to...

22 April 2012

Iustin Pop: Running: Z rich city run rain, rain, rain

This year was the 10th anniversary of the Z rich marathon, so they had a new category: 10km city run (at least, I think it was a new category). So I went along, thinking 10k would be a nice Sunday run. But oh, even the famed Swiss organisational skills fail some time. And when they fail, they fail hard. First, if you have 3 events all going on at the same time (marathon, team run, city run), please, please mark the road appropriately. The finish line was marked only marathon and team run, so I didn't even knew when I finished, I only realised well, that must have been it when I reached the place where I gave back the transponder chip. This marking issue was on-going for the last ~500-600 metres, as I thought I was lost when running along a road marked (again) only with marathon and team run and almost exited the road to look for my actual trail. Of course, if I had remembered that the finish is common to all three categories, all would have been good, but don't ask me to remember such stuff at the end of a race. Second, clothes storage. They had clothes storage organised nicely, in railroad freight cars, 500 people per car. This went well, except that they underestimated the number of people registering, so the last car had well over 700 bags, which created a huge bottleneck for this single car, which of course was where I also had the clothes (as I registered late). So I run for ~1 hour and then spend 30 minutes getting my clothes back. This wouldn't have been an issue, if the weather would have been fine. But it started raining seriously 500 meters into the race, and by the time I finished I was already fully drenched, just in a t-shirt (thanks fully with long pants) and waiting in the rain for 30 minutes was painful. I think also unhealthy, due to the abrupt switch from running to just standing in the rain. I haven't been so cold in a long while. By the time I got home my hands were shaking so hard I couldn't unzip my jacket (and I couldn't stop shaking, which is not a good sign). So yeah, I learned a few things. Try not to use the clothes storage if you can (it was the first time I used it), especially if you can pick up the stuff they give you a day early (and I could have gone to the race directly dressed) and I hope the organisers learned to provide better conditions just in case it rains. Which in Z rich happens, well, quite often. People were joking in the line: "1 hour running, 1 hour picking the clothes, 1 week sick". I hope I won't get sick (a hot tea after a hot bath can do wonders), but still, at least my voice will sound like after whole-night partying. OK, rant over. What about the race? It's nice (except the very dark overcast & rain, as said), through the centre of the city; as a friend says: Running on Bahnhofstrasse is cool . Otherwise, 10k seem to go quite quick nowadays . Having ~1,600 people all in red t-shirts was funny, especially if you're trying to look for friends. It was a very diverse crowd, you could see more varied people than in the other races (where more running-focused people seem to come); here there were all kind of people, children, teenagers, adults and so on, no per-category start or such. As to myself, I still am bothered by some small injuries, so I can't enjoy the nice result I got: 47m26.4s, for a 4m.44s/km! Yay! Of course, the reason I run so fast was because I was cold and needed to run to heat up. Also, again a flat race. I should start getting friends with hills. And due to the fact that it was a more diverse crowd, for the first time I didn't class (in my age category) around the ~66% down the list, but around ~44% only (~13 minutes slower than the winner). Yay!

11 April 2012

Erich Schubert: Are likes still worth anything?

When Facebook became "the next big thing", you had the "like" buttons pop up on various web sites. An of course "going viral" was the big thing everybody talked about, in particular SEO experts (or those that would like to be that).
But things have changed. In particular Facebook has. In the beginning, any "like" would be announced in the newsfeed to all your friends. This was what allowed likes to go viral, when your friends re-liked the link. This is what made it attractive to have like buttons on your web pages. (Note that I'm not referring to "likes" of a single Facebook post; they are something quite different!)
Once that everybody "knew" how important this was, everbody tried to make the most out of it. In particular scammers, viruses and SEO people. Every other day, some clickjacking application would flood Facebook with likes. Every backwater website was trying to get more audience by getting "liked". But at some point Facebook just stopped showing "likes". This is not bad. It is the obvious reaction when people get too annoyed by the constant "like spam". Facebook had to put an end to this.

But now that a "like" is pretty much worthless (in my opinion). Still, many people following "SEO Tutorials" are all crazy about likes. Instead, we should reconsider whether we really want to slow down our site loading by having like buttons on every page. A like button is not as lightweight as you might think it is. It's a complex JavaScript that tries to detect clickjacking attacks, and in fact invades your users' privacy, up to the point where for example in Germany it may even be illegal to use the Facebook like button on a web site.
In a few months, the SEO people will realize that the "like"s are a fad now, and will likely all try to jump the Google+ bandwagon. Google+ is probably not half as much a "dud" as many think it is (because their friends are still on Facebook and because you cannot scribble birthday wishes on a wall in Google+). The point is that Google can actually use the "+1" likes to improve everyday search results. Google for something a friend liked, and it will show up higher in the search results, and Google will show the friend who recommended it. Facebook cannot do this, because it is not a search engine (well, you can use it for searching people, although Ark probably is better at this, and one does nowhere search as many people as one does regular web searches). Unless they go into a strong partnership with Microsoft Bing or Yahoo, the "like"s can never be as important as Google "+1" likes. So don't underestimate the Google+ strategy on the long run.
There are more points where Facebook by now is much less useful as it used to be. For example event invitations. When Facebook was in full growth, you could essentially invite all your friends to your events. You could also use lists to organize your friends, and invite only the appropriate subset, if you cared enough. The problem again was: nobody cared enough. Everybody would just invite all their friends, and you would end up getting "invitation spam" several times a day. So again Facebook had to change and limit the invitation capabilities. You can no longer invite all, or even just all on one particular list. There are some tools and tricks that can work around this to some extend, but once everybody uses that, Facebook will just have to cut it down even further.
Similarly, you might remember "superpoke" and all the "gift" applications. Facebook (and the app makers) probably made a fortune on them with premium pokes and gifts. But then this too reached a level that started to annoy the users, so they had to cut down the ability of applications to post to walls. And boom, this segment essentially imploded. I havn't seen numbers on Facebook gaming, and I figure that by doing some special setup for the games Facebook managed to keep them somewhat happy. But many will remember the time when the newsfeed would be full of Farmville and Mafia Wars crap ... it just does no longer work this way.

So when working with Facebook and such, you really need to be on the move. Right now it seems that groups and applications are more useful to get that viral dream going. A couple of apps such as Yahoo currently require you to install their app (which then may post to your wall on your behalf and get your personal information!) to follow a link shared this way, and then can actively encourage you to reshare. And messages sent to a "Facebook group" are more likely to reach people that aren't direct friends of yours. When friends actually "join" an event, this is currently showing up in the news feed. But all of this can change with 0 days notice.
It will be interesting to see if Facebook can on the long run keep up with Googles ability to integrate the +1 likes into search results. It probably takes just a few success stories in the SEO community to become the "next big thing" in SEO to get +1 instead of Facebook likes. Then Google just has to wait for them to virally spread +1 adoption. Google can wait - its Google Plus growth rates aren't bad, and they have a working business model already that doesn't rely on the extra growth - they are big already and make good profits.
Facebook however is walking on a surprisingly thin line. They need a tight control on the amount of data shared (which is probably why they try to do this with "magic"). People don't want to have the impression that Facebook is hiding something from them (although it is in fact suppressing a huge part of your friends activity!), but they also don't want to get all this data spammed onto them. And in particular, it needs to give the web publishers and app developers the right amount of that extra access to the users, while in turn keeping the major spam away from the users.

Independent of the technology and actual products, it will be really interesting to see if we manage to find some way to keep the balance in "social" one-to-many communication right. It's not a fault of Facebook that many people "spam" all their friends with all their "data". Googles Circles probably isn't the final answer either. The reason why email still works rather well was probably because it makes one-to-one communication easier than one-to-many, because it isn't realtime, and because people expect you to put enough effort into composing your mails and choosing the right receipients for the message. Current "social" communication is pretty much posting everything to everyone you know adressed as "to whoever it may concern". Much of it is in fact pretty non-personal or even non-social. We have definitely reached the point where more data is shared than is being read. Twitter is probably the most extreme example of a "write-only" medium. The average number a tweet is read by a human except the original poster must be way below 1, and definitely much less than the average number of "followers".
So in the end, the answer may actually be a good automatic address book, with automatic groups and rich clients, to enable everybody to easily use email more efficiently. On the other hand, separting "serious" communication from "entertainment" communication may be well worth having a separate communications channel, and email definitely is dated and is having spam problems.

15 March 2012

Erich Schubert: ELKI applying for GSoC 2012

I've submitted an organization application to the Google Summer of Code 2012 for ELKI - an open source data mining framework I'm working on.
ELKI
I hope I can get spots for 1-2 students to help implementing additional state of the art methods, to allow for even more broad comparisons. Acceptance notification will be tomorrow. I have no idea how high out chances are of getting accepted. We're open source, as part of the university we have a proven record of educating students (in fact, around two dozen have by now contributed to ELKI, although not all of this has been "released" yet). We're rather small compared to e.g. Gnome, Debian or Apache and just a few years old. But I believe we are trying to fill an important gap in the research and open source interaction: way too many algorithms get published in science, but are never made available as source code to test and compare. This is where we try to step in: make many largely overlooked methods available. Of course we also have k-means (who hasn't?), but there is much more than just k-means! And there are plenty of methods often cited in scientific literature, yet nobody seems to have a working implementation of them ...

While most people equal data mining to prediction and classification (both of which are actually more machine learning topics), ELKI is strong on cluster analysis and outlier detection as well as index structures. Plus, it is much more flexible than other frameworks. For example, we allow almost arbitrary combinations of algorithms and distance functions. So with our tool, you can easily test your own distance function by plugging it into various algorithms. Otherwise we could just have used Weka.
The other reason why we did not extend Weka (or used R) is that we actually wanted to not just implement some algorithms, but actually be able to study the effects of index structures on the algorithms. And in my opinion, this is actually the key difference between true data mining and ML, AI or "statistics". In ML and statistics, the key objective is the result quality. Which is often rather easy to measure, too. For "full" data mining, one also needs to consider all the issues of actually managing the data, indexing it to accelerate computations. Plus, the prime objective is to discover something new, that you did not know before.
Of course these things cannot be completely separated. You can of course discover patterns and rules in the data that will allow you to make good predictions. But data mining is not just prediction and classification!

The next release of ELKI, 0.5, will be released in early april. ELKI 0.4 is already available in Debian testing and Ubuntu. The next release has a focus on comparing clustering results, including an experimental visualization for clustering differences, which will be presented at the ICDE 2012 conference.
On the big to-do list is a lot of stuff. In particular, ELKI is currently mostly useful for research, as it is too difficult to use for the average business guy wanting to do data mining. You can think of it more as a prototyping system: try out what works for you, then implement that within your larger project in an integrated and optimized way. The other big thing that I'm unhappy with is the visualization speed. SVG is great for print export, but Batik can be really slow, and the XML-DOM-API isn't exactly "accessible" to students wanting to add new visualizations. Right now, it is all useful and okay for my kind of experiments, but it could be so much more if we could solve the speed (and memory) issues of the pure SVG based approach. I'd love to see something much faster here, but with a SVG export for editing and printing.

14 March 2012

Erich Schubert: Google Scholar, Mendeley and unreliable sources

Google Scholar and Mendeley need to do more quality control.
Take for example the article
A General Framework for Increasing the Robustness of PCA-Based Correlation Clustering Algorithms
Hans-Peter Kriegel, Peer Kr ger, Erich Schubert, Arthur Zimek
Scientific and Statistical Database Management (SSDBM 2008)
(part of my diploma thesis).
Apparently, someone screwed up entering the data into Mendeley and added the editors to the authors. Now Google happily imported this data into Google Scholar, and keeps on reporting the authors incorrectly, too. Of course, many people will again import this incorrect data into their bibtex files, upload it to Mendeley and others...
Yet, neither Google Scholar nor Mendeley has an option for reporting such an error. They don't even realize that maybe Springerlink - where the DOI points to - is the more reliable source.
On the contrary, Google Scholar just started suggesting to me that the editors are coauthors ...
They really need to add an option to fix such errors. There is nothing wrong with having errors in gathered data, but you need to have a way of fixing them.

25 January 2012

Russell Coker: SE Linux Status in Debian 2012-01

Since my last SE Linux in Debian status report [1] there have been some significant changes. Policy Last year I reported that the policy wasn t very usable, on the 18th of January I uploaded version 2:2.20110726-2 of the policy packages that fixes many bugs. The policy should now be usable by most people for desktop operations and as a server. Part of the delay was that I wanted to include support for systemd, but as my work on systemd proceeded slowly and others didn t contribute policy I could use I gave up and just released it. Systemd is still a priority for me and I plan to use it on all my systems when Wheezy is released. Kernel Some time between Debian kernel 3.0.0-2 and 3.1.0-1 support for an upstream change to the security module configuration was incorporated. Instead of using selinux=1 on the kernel command line to enable SE Linux support the kernel option is security=selinux. This change allows people to boot with security=tomoyo or security=apparmor if they wish. No support for Smack though. As the kernel silently ignores command line parameters that it doesn t understand so there is no harm in having both selinux=1 and security=selinux on both older and newer kernels. So version 0.5.0 of selinux-basics now adds both kernel command-line options to GRUB configuration when selinux-activate is run. Also when the package is upgraded it will search for selinux=1 in the GRUB configuration and if it s there it will add security=selinux. This will give users the functionality that they expect, systems which have SE Linux activated will keep running SE Linux after a kernel upgrade or downgrade! Prior to updating selinux-basics systems running Debian/Unstable won t work with SE Linux. As an aside the postinst file for selinux-basics was last changed in 2006 (thanks Erich Schubert). This package is part of the new design of SE Linux in Debian and some bits of it haven t needed to be changed for 6 years! SE Linux isn t a new thing, it s been in production for a long time. Audit While the audit daemon isn t strictly a part of SE Linux (each can be used without the other) it seems that most of the time they are used together (in Debian at least). I have prepared a NMU of the new upstream version of audit and uploaded it to delayed/7. I want to get everything related to SE Linux up to date or at least with comparable versions to Fedora. Also I sent some of the Debian patches for the auditd upstream which should reduce the maintenance effort in future. Libraries There have been some NMUs of libraries that are part of SE Linux. Due to a combination of having confidence in the people doing the NMUs and not having much spare time I have let them go through without review. I m sure that I will notice soon enough if they don t work, my test systems exercise enough SE Linux functionality that it would be difficult to break things without me noticing. Play Machine I am now preparing a new SE Linux Play Machine running Debian/Unstable. I wore my Play Machine shirt at LCA so I ve got to get one going again soon. This is a good exercise of the strict features of SE Linux policy, I ve found some bugs which need to be fixed. Running Play Machines really helps improve the overall quality of SE Linux. Related posts:
  1. Status of SE Linux in Debian LCA 2009 This morning I gave a talk at the Security mini-conf...
  2. SE Linux in Debian I have now got a Debian Xen domU running the...
  3. Debian SE Linux Status At the moment I ve got more time to work on...

5 January 2012

Jaldhar Vyas: Jeopardy!: The Post-Mortem

By now hopefully you got a chance to see my Jeopardy! episode so it won't be much of a spoiler to mention that I didn't win. It went like this. I flew into LA on the night of October 31st. The taping was to take place on the next two days. Jeopardy! tapes two weeks of shows, two days a week for a total of 10 episodes. Contestants are randomly chosen for one of those 10 episodes but you don't find out which one you will be on until the last minute. Randomness and secrecy have become become a big part of the production process in games since the quiz show scandals of the 1950s. (Interesting fact: it is a Federal felony to tamper with a TV game show.) There were a group of just under 20 0f us altogether waiting to go on. All very nice people. Here in the New York area one tends to equate very intelligent people with "Type A" personalities but we got on well together. Interspersed between the tapings, we had short practice sessions to familiarize ourselves with the studio setup and this is where I first started getting a little apprehensive. Jeopardy! is not just about general knowledge, there is a physical element too in that you have to buzz in by pressing a button on a stick which is wired up to some producers console. You can't buzz in before Alex has finished reading the question and obviously you can't buzz in if someone is already answering the question. If you buzz at the wrong time, your buzzer locks for 0.75 seconds so just jamming your finger on the button. I didn't even get that far as on the entire first day I failed to buzz in at all despite changing hands, changing grip different ways and other strategies. I was pretty despondent by the second day but then miraculously I managed to get the hang of it. So when I was finally selected (for the 3rd game of that day.) I was back to my usual cool self. I was up against Dave Leach and Nicholas Campiz who were returning after having tied in the previous game. And the game did start well enough. Oh one thing you may notice is that Alex mispronounces my name Jaldhal. I knew he did that once or twice and even mentioned it to a producer but it was only viewing the episode right now that I see he does it consistently. I assumed such things would be fixed up in post-editting but I guess not. Oh well, after living in this country for so many years I've learned to be happy whenever two consonants are more or less in the right place. This particular mispronunciation is actually fairly common, even Indian people have been known to make it. The alternative theory is that Alex couldn't read my horrible handwriting which is also plausible. Anyway, I digress. I started off fairly well. My first mistake was in a category about alcoholic drinks. I answered vermouth when the correct response was gin. In my defense I'm a strict teetoler so none of these words mean anything to me. I also earned the disapproval of my superhero fan son by mistaking Dr. Strange for Dr. Doom. I did get the one about Spiderman villain Dr. Octopus so that mollified him somewhat. The highlight of my evening was strangely enough in a category about books of the Bible. I got a Daily Double. It was a risky move but I have always wanted to make a "true" daily double so I threw caution to the winds and wagered everything I had. To my surprise as much as anyones, got the answer right doubling my score. I was in a pretty good position for the Double Jeopardy but thats when Dave started dominating. If you haven't been watching previous games, Dave Leach is arguably the best player this season. Not only is he smart but he is almost supernaturally fast on the buzzer. You can't see because I kept my hands behind the podium but I again and again I would desperately try to ring in on a question I knew the answer to only to be beaten to the punch by Dave. This caused me to start getting a bit reckless and answering questions where I was less than certain which cost me dearly. One particularly dumb mistake was in a category where all the answers were supposed to start with the letter g. I said compass when I should said gyrocompass but actually I should have left that category alone entirely. By the time Final Jeopardy came around winning was no longer an option as Dave was too far in the lead. I was slightly ahead of Nicholas in 2nd place. This is where I again made a strategic error. My original plan to bet nothing in Final Jeopardy but the category was "1930s Novels" and I liked my chances. The clue was read out and it involved an anti-war novel whose author was blacklisted. I was elated. It had to be "All Quiet on the Western Front" right? The blacklist bit gave me pause as the author Erich Maria Remarque was German and this was obviously a reference to the anti-Communist blacklist in the 1950s USA. But somehow I remembered that he had spent the end of his life in Hollywood or maybe I just convinced myself that it was plausible. Evidently my thought process wasn't too crazy because all three of us came up with the same answer. All three of us were wrong. The correct answer was "Johnny Got His Gun" by Daltan Trumbo. In hindsight I knew this but it illustrates the strange ways the mind works. I knew because the video for Metallicas song "One" from "...And Justice for All" is based on the film adaptation of this book. However in a previous orientation session the lady who is the contestant coordinator for Jeopardy! had mentioned AQOTHWF as anexample of something and this got stuck in my brain crowding out the previously known fact. So when all was said and done I ended up in 3rd place. (I bet big whereas Nicholas was more conservative.) Oh well. Despite not winning I had a blast. I fulfilled my dream of appearing on Jeopardy! I met Alex Trebek. (And I should be receiving an autographed picture of us together soon.) My witty banter was sufficiently witty. I made enough money to cover my travel expenses and still have a nice amount left. And I got to meet lots of interesting and wonderful people. So I'm satisfied.

9 October 2011

Erich Schubert: Class management system

Dear Lazyweb.
A friend of mine is looking for a small web application to manage tiny classes (as in course, not as in computing). They usually span just four dates, and people will often sign up for the next class then. Usually 10-20 people per class, although some might not sign up via internet.
We deliberately don't want to require them to fully register for the web site and go through all that registration, email verification etc. trouble. Anything that takes more than filling out the obviously required form will just cause trouble.
At first it sounded like this is a common task, but in essence all the systems I've seen so far are totally overpowered for this. There are no grades, no working groups, no "customer relationship management". There isn't much more needed than the ability to easily configure the classes, have people book them, and get the list of singed up users into a spreadsheet easily (CSV will do).
It must be able to run on the typical PHP+MySQL web hoster and be open source.
Any recommendations? Drop me a comment or email at erich () debian org Thank you.

28 September 2011

Erich Schubert: Privacy in the public opinion

Many people in the united states seem to have the opinion, that the "public is willing to give up most of their privacy" in particular when dealing with online services such as Facebook. I believe in his keynote at ECML-PKDD, Albert-L szl Barab si of Harvard University expessed such a view, that this data will just become more and more available. I'm not sure if it was him or someone else (I believe it was someone else) that essentially claimed "privacy is irrelevant". Another popular opinion is that "it's only old people caring for privacy".
However, just like politics, these things tend to oscillate from one extreme to another. For example, the recent years in Europe, conservative parties were winning one election after another. Now in France, the socialist parties have just won the senate, the conservative parties in Germany are losing in one state after the other and so on. And this will change back again, too. Democracy also lives from changing roles in government, as this drives both progress and fights corruption.
We might be seeing the one extreme in the united states right now, where people are readily giving away their location and interests for free access to a web site. This can swing back any time.
In Germany, one of the government parties - the liberal democrats, FDP - just dropped out of the Berlin city government, down to 1.8% of voters. Yes, this is the party the German foreign minister, Guido Westerwelle is from. The pirate party [en.wikipedia.org] - much of their program is about privacy, civil rights, copyright reforms and the internet - which didn't even participate in the previous elections since they were was founded just 5 years ago jumped to 8.9%, scoring higher than the liberal democrats did in the previous elections. In 2009 they scored a surprising high 2% in the federal elections - current polls see them anywhere from 4% to 7% at the federal level, so they will probably get seats in parliament in 2013. (There are also other reasons why the liberal democrats have been losing voters so badly, though! Their current numbers indicate they might drop out of parliament in 2013.)
The Greens in Germany, which are also very much oriented towards privacy and civil rights, are also on the rise, and in march just became the second strongest party and senior partner in the governing coalition of Baden-W rttemberg, which historically was a heartland of the conservatives.
So don't assume that privacy is irrelevant nowadays. The public opinion can swing quickly. In particular in democratic systems that have room for more than two parties - so probably not in the united states - such topics can actually influence elections a lot. Within 30 years, the Greens now frequently reach values of 20% in federal polls and up to 30% in some states. It doesn't look as if they are going to go away soon.
Also don't assume that it's just old people caring about privacy - in Germany, in particular the pirate party and the Greens are very much favored by the young people. The typical voter for the pirates is less than 30 years old, male, has a higher education and works in the media or internet business.
In Germany, much of the protest for more privacy - and against the too readily data collection by companies such as Facebook and Google - is driven by the young internet-users and -workers. I believe this will be similar in other parts of Europe - there are other pirate parties all over Europe. And this can happen to the united states any time, too.
Electronic freedom - e.g. pushed by the Electronic Frontier Foundation, but also the open source movement - does have quite a history in the united states. But in particular open source has made such a huge progress the last decade, these movements in the US could just be a bit out of breath right now. I'm sure they will come back with a strong push against the privacy invasions we're seeing right now. And that can likely take down a giant like Facebook, too. So don't bet on people continuing to give up their privacy!

29 August 2011

Erich Schubert: ELKI 0.4 beta release

Two weeks ago, I've published the first beta of the upcoming ELKI 0.4 release. The accompanying publication at SSTD 2011 won the "Best Demonstration Paper Award"!
ELKI is a Java framework for developing data mining algorithms and index structures. It has indexes such as the R*-Tree and M-Tree, and a huge collection of algorithms and distance functions. These are all writen rather generic, so you can build all the combinations of indexes, algorithms and distances. There are evaluation and visualization modules.
Note that I'm using "data mining" in the broad, original sense that focuses on knowledge discovery by unsupervised methods such as clustering and outlier detection. Today, many people just think of machine learning and "artificial intelligence" - or even worse: large scale data collecting - when they hear data mining. But there is much more to it than just learning!
Java comes at a certain price. The latest version got already around 50% faster than the previous release just by reducing Java boxing and unboxing that puts quite some pressure on the memory management. So you could implement these things in C to become a lot faster; but this is not production software. I need code that I can put students on to work with it and extend it, this is much more important than getting the maximum speed. You can probably still use this for prototyping. See what works, then implement just that which you really need in a low level language for maximum performance.
You can do some of that in Java. You could work on a large chunk of doubles, and access them via the Unsafe class. But is that then still Java, or aren't you actually doing just plain C? In our framework, we want to be able to support non-numerical vectors and non-double distances, too. Even when they are only applicable to certain specialized use cases. Plus, generic and Java-style code is usually much more readable, and the performance cost is not critical for research use.
Release 0.4 has plenty of under the hood changes. It allows multiple indexes to exist in parallel, it support multi-relational data. There are also a dozen new algorithms, mostly from the geo/spatial outlier field, which were used for the demonstration. But for example, it also includes methods for rescaling the output of outlier detection methods to a more sensible numerical scale for visualization and comparison.
You can install ELKI on a Debian testing and unstable system by the usual "aptitude install elki" command. It will install a menu entry for the UI and also includes the command-line launcher "elki-cli" for batch operation. The "-h" flag can produce an extensive online help, or you can just copy the parameters from the GUI. By reusing Java packages such as Batik and FOP already in Debian, this also is a smaller download. I guess the package will at some point also transition to Ubuntu - since it is Java you can just download and install it anyway I guess.

22 August 2011

Erich Schubert: Missing in Firefox: anon and regular mode in parallel

The killer feature that Chrome has and Firefox is missing is quite simple: the ability to have "private" aka "anonymous" and non-private tabs open at the same time. As far as I can tell, with Firefox you can only be in one of these modes.
My vision of a next generation privacy browser would essentially allow you to have tabs in individual modes. With some simple rules for mode switching. Essentially, unknown sites should alwasy be opened in anonymous mode. Only sites that I register for should automatically switch to a tracked mode where cookies are kept, for my own convenience. And then there are sites in a safety category that should be isolated from any other contents such as my banking site. Going to these sites should require me to manually switch modes (except when using my bookmark). Embedding, framing and such things to these sites should be impossible.
On a side note, having TOR tabs would also be nice.

16 August 2011

Erich Schubert: Documenting fat-jar licenses

Dear Lazyweb,
What is the appropriate way to document the individual licenses of a fat jar (a jar archive that includes the program along with all its dependencies)?
I've been living in the happy world of Linux distributions, where one would just package the dependencies independently (or just specify which existing packages you use, actually, since most is already packaged by helpful other developers). But for the non-Linux people, a fat jar seems to be important so they can double-click it to run the application.
When building larger Java applications, you end up using a couple of external code such as various Apache Commons and Apache Batik. I'm currently including all their LICENSE.txt files in a directory named "legal", and I'm trying to make it as obvious as possible which license applies to which parts of the jar archive. Is there any best-practice of doing this? I don't want to reinvent the wheel; it'd also like to avoid any common legal pitfalls, obviously.
Feel free to respond either using the Disqus comment function or by email via erich () debian org

26 July 2011

Erich Schubert: Restricting Skype via iptables

Whenever I launch Skype on my computer, it gets banned from the university network within a few minutes; the ban expires again after a few minutes when I close Skype. This is likely due to the aggresive nature of Skype, maybe the firewalls think it is trying to do a DDoS attack. One of the known big issues of using Skype.
For Windows users, there are some known workaround to limit Skype that usually involve registry editing. These are however not available on Linux, unfortunately.
Therefore, I decided to play around with advanced iptables functionality. While you cannot match the originating process reliably (the owner match module seemed to include such functionality at some point, but it was deemed unreliable on multi-core systems). However, there are other and more efficient methods of achieving the same.
Here's my setup:
# Add a system group for Skype
addgroup --system skype
# Override permissions of skype (assuming Debian package!)
dpkg-statoverride --update --add root skype 2755  which skype 
And these are the iptables rules I use:
iptables -I OUTPUT -p tcp -m owner --gid-owner skype \
    -m multiport ! --dports 80,443 -j REJECT
iptables -I OUTPUT -p udp -m owner --gid-owner skype -j REJECT
They allow outgoing connections by Skype only on ports 80 and 443, which supposedly do not trigger the firewall (in fact, this filter is recommended by our network administration for Skype).
Or wrapped as pyroman (my firewall configuration tool; aptitude install pyroman) module:
"""
Skype restriction to avoid firewall block.
Raw iptables commands.
"""
iptables(Firewall.output, "-p tcp -m owner --gid-owner skype -m multiport ! --dports 80,443 -j %s" % Firewall.reject)
iptables(Firewall.output, "-p udp -m owner --gid-owner skype -j %s" % Firewall.reject)
which I've put just after the conntrack default module, as 05_skype.py

3 July 2011

Erich Schubert: Google vs. Facebook

Let the games begin. It looks like Google and Facebook are going to fight it out.
Given the spectacular failures of Wave and Buzz, people are of course skeptic about the success of Google Plus. However, I'm rather confident it is going to stay. Here are some things I like about it:
Some people think that Google will not stand a chance against social giant Facebook. But after all, Google has more users - and they have tons of services people like to use. So when Google Maps has Plus integration, will the users use it to chat about where to meet, or will they go the long way and post the map URL on Facebook, without the option of updating it collaboratively?
Googles position is much stronger than many people believe, once you think about integration possibilities. The current Plus is just a fragment, the missing puzzle piece connecting the other apps. But imagine that Google now Plus-connects its various services: YouTube, Maps, Mail, Talk, ... - for them, this is just a matter of some engineering. I guess probably half of these are already in internal testing. And Facebook just can't keep up there. Sure, they do have Facebook Video. But actually, most people prefer YouTube. And while Google can integrate plus all the way there, Facebook cannot. And while Google is the master of search, Facebook is particularly weak there - they can't even properly search their own data. Google however will at some point offer an "find things that interest me" button; Sparks is just the beginning where you have to manually define your interests (which will probably remain an option due to privacy concerns!); it is way too static right now.
So essentially, Google doesn't need to copy Facebook. They just need to do what is obvious on their own products, and Facebook will have a hard time keeping up.
Plus, in my opinion, Google got the timing just right. Users aren't too happy with Facebook these days, there is just no big alternative around anymore; their friends are on Facebook and not on some other social network. Facebook doesn't seem to evolve much anymore. The mail functionality opened more security holes (apparently you can post to a group with a fake user name when you spoof the senders email address) than it contributed to functionality and usefulness. Privacy is still not in line with all countries such as Germany; but Facebook keeps telling those users essentially that they don't care. Spam and fraud still reappears every month following the same pattern again and again (Clickjacking). The search function of Facebook is still usually described as "useless" ... People waste time in games and annoy their friends with random game invitations and posts. Facebook should better make a major move now, too. More than a demo of Skype integration. But whenever Facebook changed, their users complained ...

Of course, Google+ still has a long way to go, too. There are still many things missing here, too. For example groups and events. I figure Google is already testing them in "dogfood", and they'll actually come out within the month. With groups I do not refer to the existing Google product, but to what would be "public circles" that you need to join and that are accessible to all the circle members instead of just the creator. And events are also a key function; probably one of the most used on Facebook. These may require much more careful design to integrate well with Calendar. But given the visual update of Calender these days, this may just be around the corner, too.

4 June 2011

Mike (stew) O'Connor: My Movein Script

Erich Schubert's blog post reminded me that I've been meaning to writeup a post detailing how I'm keeping parts of my $HOME in git repositories. My goal has been to keep my home directory in a version control system effectively. I have a number of constraints however. I want the system to be modular. I don't always need X related config files in my home directory. Sometimes I want just my zsh related files and my emacs related files. I have multiple machines I check email from, and on those want to keep my notmuch/offlineimap files in sync, but I don't need these on every machine I'm on, expecially since those configurations have more sensitive data. I played around with laysvn for a while, but it never really seemed comfortable. I more recently discovered that madduck had started a "vcs-home" website and mailing list, talking about doing what I'm trying to do. I'm now going with madduck's idea of using git with detached work trees, so that I can have multiple git repositories all using $HOME as their $GIT_WORK_TREE. I have a script inspired by his vcsh script that will create a subshell where the GIT_DIR, GIT_WORK_TREE variables are set for me. I can do my git operations related to just one git repository in that shell, while still operating directly on my config files in $HOME, and avoiding any kind of nasty symlinking or hardlinking. Since I am usually using my script to allow me to quickly "move in" to a new host, I named my script "movein". It can be found here. Here's how I'll typically use it:
    stew@guppy:~$ movein init
    git server hostname? git.vireo.org
    path to remote repositories? [~/git] 
    Local repository directory? [~/.movein] 
    Location of .mrconfig file? [~/.mrconfig] 
    stew@guppy:~$ 
This is just run once. It asks me questions about how to setup the 'movein' environment. Now I should have a .moveinrc storing the answers I gave above, I have a stub of a .mrconfig, and an empty .movein directory. Next thing to do is to add some of my repositories. The one I typically add on all machines is my "shell" repository. It has a .bashrc/.zshrc, an .alias that both source and other zsh goodies I'll generally wish to be around:
    stew@guppy:~$ ls .zshrc
    ls: cannot access .zshrc: No such file or directory
    stew@guppy:~$ movein add shell
    Initialized empty Git repository in /home/stew/.movein/shell.git/
    remote: Counting objects: 42, done.
    remote: Compressing objects: 100% (39/39), done.
    remote: Total 42 (delta 18), reused 0 (delta 0)
    Unpacking objects: 100% (42/42), done.
    From ssh://git.vireo.org//home/stew/git/shell
     * [new branch]      master     -> origin/master
    stew@guppy:~$ ls .zshrc
    .zshrc
So what happened here is that the ssh://git.vireo.org/~/git/shell.git repository was cloned with GIT_WORK_TREE=~ and GIT_DIR=.movein/shell.git. My .zshrc (along with a bunch of other files) has appeared. Next perhaps I'll add my emacs config files:
    stew@guppy:~$ movein add emacs       
    Initialized empty Git repository in /home/stew/.movein/emacs.git/
    remote: Counting objects: 77, done.
    remote: Compressing objects: 100% (63/63), done.
    remote: Total 77 (delta 10), reused 0 (delta 0)
    Unpacking objects: 100% (77/77), done.
    From ssh://git.vireo.org//home/stew/git/emacs
     * [new branch]      emacs21    -> origin/emacs21
     * [new branch]      master     -> origin/master
    stew@guppy:~$ ls .emacs
    .emacs
    stew@guppy:~$ 
My remote repositry has a master branch, but also has an emacs21 branch, which I can use when checking out on older machines which don't yet have newer versions of emacs. Let's say I have made changes to my .zshrc file, and I want to check them in. Since we are working with detached work trees, git can't immediately help us:
    stew@guppy:~$ git status
    fatal: Not a git repository (or any of the parent directories): .git
The movein script allows me to "login" to one of the repositories. It will create a subshell with GIT_WORK_TREE and GIT_DIR set. In that subshell, git operations operate as one might expect:
    stew@guppy:~ $ movein login shell
    stew@guppy:~ (shell:master>*) $ echo >> .zshrc
    stew@guppy:~ (shell:master>*) $ git add .zshrc                                       
    stew@guppy:~ (shell:master>*) $ git commit -m "adding a newline to the end of .zshrc"
    [master 81b7311] adding a newline to the end of .zshrc
     1 files changed, 1 insertions(+), 0 deletions(-)
    stew@guppy:~ (shell:master>*) $ git push
    Counting objects: 8, done.
    Delta compression using up to 2 threads.
    Compressing objects: 100% (6/6), done.
    Writing objects: 100% (6/6), 546 bytes, done.
    Total 6 (delta 4), reused 0 (delta 0)
    To ssh://git.vireo.org//home/stew/git/shell.git
       d24bf2d..81b7311  master -> master
    stew@guppy:~ (shell:master*) $ exit
    stew@guppy:~ $ 
If I want to create a brand new repository from files in my home directory. I can:
    stew@guppy:~ $ touch methere
    stew@guppy:~ $ touch mealsothere
    stew@guppy:~ $ movein new oohlala methere mealsothere
    Initialized empty Git repository in /home/stew/git/oohlala.git/
    Initialized empty Git repository in /home/stew/.movein/oohlala.git/
    [master (root-commit) 7abe5ba] initial checkin
     0 files changed, 0 insertions(+), 0 deletions(-)
     create mode 100644 mealsothere
     create mode 100644 methere
    Counting objects: 3, done.
    Delta compression using up to 2 threads.
    Compressing objects: 100% (2/2), done.
    Writing objects: 100% (3/3), 224 bytes, done.
    Total 3 (delta 0), reused 0 (delta 0)
    To ssh://git.vireo.org//home/stew/git/oohlala.git
     * [new branch]      master -> master
Above, the command movein new oohlala methere mealsothere says "create a new repository containing two files: methere, mealsothere". A bare repository is created on the remote machine, a repository is created in the .movein directory, the files are committed, and the new commit is pushed to the remote repository. New on some other machine, I could run movein add oohlala to get these two new files. The movein script maintains a .mrconfig file, so that joeyh's mr tool can be used to manage the repositories in bulk. Commands like "mr update", "mr commit", "mr push" will act on all the known repositories. Here's an example:
    stew@guppy:~ $ cat .mrconfig
    [DEFAULT]
    include = cat /usr/share/mr/git-fake-bare
    [/home/stew/.movein/emacs.git]
    checkout = git_fake_bare_checkout 'ssh://git.vireo.org//home/stew/git/emacs.git' 'emacs.git' '../../'
    [/home/stew/.movein/shell.git]
    checkout = git_fake_bare_checkout 'ssh://git.vireo.org//home/stew/git/shell.git' 'shell.git' '../../'
    [/home/stew/.movein/oohlala.git]
    checkout = git_fake_bare_checkout 'ssh://git.vireo.org//home/stew/git/oohlala.git' 'oohlala.git' '../../'
    stew@guppy:~ $ mr update
    mr update: /home/stew//home/stew/.movein/emacs.git
    From ssh://git.vireo.org//home/stew/git/emacs
     * branch            master     -> FETCH_HEAD
    Already up-to-date.
    mr update: /home/stew//home/stew/.movein/oohlala.git
    From ssh://git.vireo.org//home/stew/git/oohlala
     * branch            master     -> FETCH_HEAD
    Already up-to-date.
    mr update: /home/stew//home/stew/.movein/shell.git
    From ssh://git.vireo.org//home/stew/git/shell
     * branch            master     -> FETCH_HEAD
    Already up-to-date.
    mr update: finished (3 ok)
    stew@guppy:~ $ mr update        
There are still issues I'd like to address. The big one in my mind is that there is no .gitignore. So when you "movein login somerepository" then run "git status", It tells you about hundreds of untracked files in your home directory. Ideally, I just want to know about the files which are already associated with the repository I'm logged into.

Mike (stew) O'Connor: My Movin Script

Erich Schubert's blog post reminded me that I've been meaning to writeup a post detailing how I'm keeping parts of my $HOME in git repositories. My goal has been to keep my home directory in a version control system effectively. I have a number of constraints however. I want the system to be modular. I don't always need X related config files in my home directory. Sometimes I want just my zsh related files and my emacs related files. I have multiple machines I check email from, and on those want to keep my notmuch/offlineimap files in sync, but I don't need these on every machine I'm on, expecially since those configurations have more sensitive data. I played around with laysvn for a while, but it never really seemed comfortable. I more recently discovered that madduck had started a "vcs-home" website and mailing list, talking about doing what I'm trying to do. I'm now going with madduck's idea of using git with detached work trees, so that I can have multiple git repositories all using $HOME as their $GIT_WORK_TREE. I have a script inspired by his vcsh script that will create a subshell where the GIT_DIR, GIT_WORK_TREE variables are set for me. I can do my git operations related to just one git repository in that shell, while still operating directly on my config files in $HOME, and avoiding any kind of nasty symlinking or hardlinking. Since I am usually using my script to allow me to quickly "move in" to a new host, I named my script "movein". It can be found here. Here's how I'll typically use it:
    stew@guppy:~$ movein init
    git server hostname? git.vireo.org
    path to remote repositories? [~/git] 
    Local repository directory? [~/.movein] 
    Location of .mrconfig file? [~/.mrconfig] 
    stew@guppy:~$ 
This is just run once. It asks me questions about how to setup the 'movein' environment. Now I should have a .moveinrc storing the answers I gave above, I have a stub of a .mrconfig, and an empty .movein directory. Next thing to do is to add some of my repositories. The one I typically add on all machines is my "shell" repository. It has a .bashrc/.zshrc, an .alias that both source and other zsh goodies I'll generally wish to be around:
    stew@guppy:~$ ls .zshrc
    ls: cannot access .zshrc: No such file or directory
    stew@guppy:~$ movein add shell
    Initialized empty Git repository in /home/stew/.movein/shell.git/
    remote: Counting objects: 42, done.
    remote: Compressing objects: 100% (39/39), done.
    remote: Total 42 (delta 18), reused 0 (delta 0)
    Unpacking objects: 100% (42/42), done.
    From ssh://git.vireo.org//home/stew/git/shell
     * [new branch]      master     -> origin/master
    stew@guppy:~$ ls .zshrc
    .zshrc
So what happened here is that the ssh://git.vireo.org/~/git/shell.git repository was cloned with GIT_WORK_TREE=~ and GIT_DIR=.movein/shell.git. My .zshrc (along with a bunch of other files) has appeared. Next perhaps I'll add my emacs config files:
    stew@guppy:~$ movein add emacs       
    Initialized empty Git repository in /home/stew/.movein/emacs.git/
    remote: Counting objects: 77, done.
    remote: Compressing objects: 100% (63/63), done.
    remote: Total 77 (delta 10), reused 0 (delta 0)
    Unpacking objects: 100% (77/77), done.
    From ssh://git.vireo.org//home/stew/git/emacs
     * [new branch]      emacs21    -> origin/emacs21
     * [new branch]      master     -> origin/master
    stew@guppy:~$ ls .emacs
    .emacs
    stew@guppy:~$ 
My remote repositry has a master branch, but also has an emacs21 branch, which I can use when checking out on older machines which don't yet have newer versions of emacs. Let's say I have made changes to my .zshrc file, and I want to check them in. Since we are working with detached work trees, git can't immediately help us:
    stew@guppy:~$ git status
    fatal: Not a git repository (or any of the parent directories): .git
The movein script allows me to "login" to one of the repositories. It will create a subshell with GIT_WORK_TREE and GIT_DIR set. In that subshell, git operations operate as one might expect:
    stew@guppy:~ $ movein login shell
    stew@guppy:~ (shell:master>*) $ echo >> .zshrc
    stew@guppy:~ (shell:master>*) $ git add .zshrc                                       
    stew@guppy:~ (shell:master>*) $ git commit -m "adding a newline to the end of .zshrc"
    [master 81b7311] adding a newline to the end of .zshrc
     1 files changed, 1 insertions(+), 0 deletions(-)
    stew@guppy:~ (shell:master>*) $ git push
    Counting objects: 8, done.
    Delta compression using up to 2 threads.
    Compressing objects: 100% (6/6), done.
    Writing objects: 100% (6/6), 546 bytes, done.
    Total 6 (delta 4), reused 0 (delta 0)
    To ssh://git.vireo.org//home/stew/git/shell.git
       d24bf2d..81b7311  master -> master
    stew@guppy:~ (shell:master*) $ exit
    stew@guppy:~ $ 
If I want to create a brand new repository from files in my home directory. I can:
    stew@guppy:~ $ touch methere
    stew@guppy:~ $ touch mealsothere
    stew@guppy:~ $ movein new oohlala methere mealsothere
    Initialized empty Git repository in /home/stew/git/oohlala.git/
    Initialized empty Git repository in /home/stew/.movein/oohlala.git/
    [master (root-commit) 7abe5ba] initial checkin
     0 files changed, 0 insertions(+), 0 deletions(-)
     create mode 100644 mealsothere
     create mode 100644 methere
    Counting objects: 3, done.
    Delta compression using up to 2 threads.
    Compressing objects: 100% (2/2), done.
    Writing objects: 100% (3/3), 224 bytes, done.
    Total 3 (delta 0), reused 0 (delta 0)
    To ssh://git.vireo.org//home/stew/git/oohlala.git
     * [new branch]      master -> master
Above, the command movein new oohlala methere mealsothere says "create a new repository containing two files: methere, mealsothere". A bare repository is created on the remote machine, a repository is created in the .movein directory, the files are committed, and the new commit is pushed to the remote repository. New on some other machine, I could run movein add oohlala to get these two new files. The movein script maintains a .mrconfig file, so that joeyh's mr tool can be used to manage the repositories in bulk. Commands like "mr update", "mr commit", "mr push" will act on all the known repositories. Here's an example:
    stew@guppy:~ $ cat .mrconfig
    [DEFAULT]
    include = cat /usr/share/mr/git-fake-bare
    [/home/stew/.movein/emacs.git]
    checkout = git_fake_bare_checkout 'ssh://git.vireo.org//home/stew/git/emacs.git' 'emacs.git' '../../'
    [/home/stew/.movein/shell.git]
    checkout = git_fake_bare_checkout 'ssh://git.vireo.org//home/stew/git/shell.git' 'shell.git' '../../'
    [/home/stew/.movein/oohlala.git]
    checkout = git_fake_bare_checkout 'ssh://git.vireo.org//home/stew/git/oohlala.git' 'oohlala.git' '../../'
    stew@guppy:~ $ mr update
    mr update: /home/stew//home/stew/.movein/emacs.git
    From ssh://git.vireo.org//home/stew/git/emacs
     * branch            master     -> FETCH_HEAD
    Already up-to-date.
    mr update: /home/stew//home/stew/.movein/oohlala.git
    From ssh://git.vireo.org//home/stew/git/oohlala
     * branch            master     -> FETCH_HEAD
    Already up-to-date.
    mr update: /home/stew//home/stew/.movein/shell.git
    From ssh://git.vireo.org//home/stew/git/shell
     * branch            master     -> FETCH_HEAD
    Already up-to-date.
    mr update: finished (3 ok)
    stew@guppy:~ $ mr update        
There are still issues I'd like to address. The big one in my mind is that there is no .gitignore. So when you "movein login somerepository" then run "git status", It tells you about hundreds of untracked files in your home directory. Ideally, I just want to know about the files which are already associated with the repository I'm logged into.

2 June 2011

Erich Schubert: Managing user configuration files

Dear Lazyweb,
How do you manage your user configuration files? I have around four home directories I frequently use. They are sufficiently well enough in sync, but I have been considering to actually use some file management to synchronize them better. I'm talking about files such as shell config, ssh config, .vimrc etc.
I had some discussions about this before, and the consensus had been that some version control system probably is best. Git seemed to be a good candidate; I remember having read about things like this a dozen years ago when CVS was still common and Subversion was new.
So dear lazyweb, what are your experiences with managing your user configuration? What setup would you recommend?
Update: See vcs-home for various related links and at least five different ways of doing this. mr, a multi-repository VCS wrapper seems particularly well at this.

27 May 2011

Erich Schubert: Dear Lazyweb, how to write multi-locale python code

Dear Lazyweb,
I've been toying around with a python WSGI application, i.e. a multi-threaded persistent web application. Now I'd like to add multi-language support to this application. I need to format datetimes to human readable formats, but I havn't found a way yet to do this in a sane way using strftime. Essentially, strftime will use the current application locale; however since I'm running multi-threaded, different threads might want to use different locales. So changing the locale is bound to cause race conditions.
So what is the best way to pretty-print (including week day names!) datetime, currency and similar values in a multi-threaded multi-locale context in python? Gettext and manually emulating strftime doesn't sound that sensible to me. And of course, I don't want to have to translate the weekday names myself into any language I choose to support...

Next.

Previous.